Search CORE

167 research outputs found

Depth map compression via 3D region-based representation

Author: Maceira Marc
Morros Rubió Josep Ramon
Ruiz Hidalgo Javier
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

In 3D video, view synthesis is used to create new virtual views between encoded camera views. Errors in the coding of the depth maps introduce geometry inconsistencies in synthesized views. In this paper, a new 3D plane representation of the scene is presented which improves the performance of current standard video codecs in the view synthesis domain. Two image segmentation algorithms are proposed for generating a color and depth segmentation. Using both partitions, depth maps are segmented into regions without sharp discontinuities without having to explicitly signal all depth edges. The resulting regions are represented using a planar model in the 3D world scene. This 3D representation allows an efficient encoding while preserving the 3D characteristics of the scene. The 3D planes open up the possibility to code multiview images with a unique representation.Postprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Action tube extraction based 3D-CNN for RGB-D action recognition

Author: Morros Rubió Josep Ramon
Vilaplana Besler Verónica
Xu Zhengyu
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

In this paper we propose a novel action tube extractor for RGB-D action recognition in trimmed videos. The action tube extractor takes as input a video and outputs an action tube. The method consists of two parts: spatial tube extraction and temporal sampling. The first part is built upon MobileNet-SSD and its role is to define the spatial region where the action takes place. The second part is based on the structural similarity index (SSIM) and is designed to remove frames without obvious motion from the primary action tube. The final extracted action tube has two benefits: 1) a higher ratio of ROI (subjects of action) to background; 2) most frames contain obvious motion change. We propose to use a two-stream (RGB and Depth) I3D architecture as our 3D-CNN model. Our approach outperforms the state-of-the-art methods on the OA and NTU RGB-D datasets. © 2018 IEEE.Peer ReviewedPostprint (published version

Crossref

UPCommons. Portal del coneixement obert de la UPC

Picking groups instead of samples: a close look at Static Pool-based Meta-Active Learning

Author: Mas Méndez Ignasi
Morros Rubió Josep Ramon
Vilaplana Besler Verónica
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

©2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Active Learning techniques are used to tackle learning problems where obtaining training labels is costly. In this work we use Meta-Active Learning to learn to select a subset of samples from a pool of unsupervised input for further annotation. This scenario is called Static Pool-based Meta-Active Learning. We propose to extend existing approaches by performing the selection in a manner that, unlike previous works, can handle the selection of each sample based on the whole selected subset.Peer ReviewedPostprint (author's final draft

arXiv.org e-Print Archive

UPCommons. Portal del coneixement obert de la UPC

Spatio-temporal road detection from aerial imagery using CNNs

Author: Luque Belen
Morros Rubió Josep Ramon
Ruiz Hidalgo Javier
Publication venue: 'Scitepress'
Publication date: 01/01/2017
Field of study

The main goal of this paper is to detect roads from aerial imagery recorded by drones. To achieve this, we propose a modification of SegNet, a deep fully convolutional neural network for image segmentation. In order to train this neural network, we have put together a database containing videos of roads from the point of view of a small commercial drone. Additionally, we have developed an image annotation tool based on the watershed technique, in order to perform a semi-automatic labeling of the videos in this database. The experimental results using our modified version of SegNet show a big improvement on the performance of the neural network when using aerial imagery, obtaining over 90% accuracy.Postprint (published version

Crossref

UPCommons. Portal del coneixement obert de la UPC

Fuji-SfM dataset: A collection of annotated images and point clouds for Fuji apple detection and location using structure-from-motion photogrammetry

Author: Gené Mola Jordi
Gregorio López Eduard
Morros Rubió Josep Ramon
Rosell Polo Joan Ramon
Ruiz Hidalgo Javier
Sanz Cortiella Ricardo
Vilaplana Besler Verónica
Publication venue: 'Elsevier BV'
Publication date: 01/01/2020
Field of study

The present dataset contains colour images acquired in a commercial Fuji apple orchard (Malus domestica Borkh. cv. Fuji) to reconstruct the 3D model of 11 trees by using structure-from-motion (SfM) photogrammetry. The data provided in this article is related to the research article entitled “Fruit detection and 3D location using instance segmentation neural networks and structure-from-motion photogrammetry” [1]. The Fuji-SfM dataset includes: (1) a set of 288 colour images and the corresponding annotations (apples segmentation masks) for training instance segmentation neural networks such as Mask-RCNN; (2) a set of 582 images defining a motion sequence of the scene which was used to generate the 3D model of 11 Fuji apple trees containing 1455 apples by using SfM; (3) the 3D point cloud of the scanned scene with the corresponding apple positions ground truth in global coordinates. With that, this is the first dataset for fruit detection containing images acquired in a motion sequence to build the 3D model of the scanned trees with SfM and including the corresponding 2D and 3D apple location annotations. This data allows the development, training, and test of fruit detection algorithms either based on RGB images, on coloured point clouds or on the combination of both types of data. Dades primàries associades a l'article http://hdl.handle.net/10459.1/68505This work was partly funded by the Secretaria d'Universitats i Recerca del Departament d'Empresa i Coneixement de la Generalitat de Catalunya (grant 2017 SGR 646), the Spanish Ministry of Economy and Competitiveness (project AGL2013-48297-C2-2-R) and the Spanish Ministry of Science, Innovation and Universities (project RTI2018-094222-B-I00). Part of the work was also developed within the framework of the project TEC2016-75976-R, financed by the Spanish Ministry of Economy, Industry and Competitiveness and the European Regional Development Fund (ERDF). The Spanish Ministry of Education is thanked for Mr. J. Gené’s pre-doctoral fellowships (FPU15/03355)

UPCommons. Portal del coneixement obert de la UPC

Repositori Obert UdL

The CAMOMILE collaborative annotation platform for multi-modal, multi-lingual and multi-media documents

Author: Adda Gilles
Barras Claude
Bredin Herve
Budnik Mateusz
Hernando Pericás Francisco Javier
Mariani Joseph
Morros Rubió Josep Ramon
Poignant Johann
Publication venue: European Language Resources Association
Publication date: 01/01/2016
Field of study

In this paper, we describe the organization and the implementation of the CAMOMILE collaborative annotation framework for multimodal, multimedia, multilingual (3M) data. Given the versatile nature of the analysis which can be performed on 3M data, the structure of the server was kept intentionally simple in order to preserve its genericity, relying on standard Web technologies. Layers of annotations, defined as data associated to a media fragment from the corpus, are stored in a database and can be managed through standard interfaces with authentication. Interfaces tailored specifically to the needed task can then be developed in an agile way, relying on simple but reliable services for the management of the centralized annotations. We then present our implementation of an active learning scenario for person annotation in video, relying on the CAMOMILE server; during a dry run experiment, the manual annotation of 716 speech segments was thus propagated to 3504 labeled tracks. The code of the CAMOMILE framework is distributed in open source.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Corpus selection

Author: Adda Gilles
Barras Claude
Hernando Pericás Francisco Javier
Kernal Ekenel Hazim
Morros Rubió Josep Ramon
Publication venue
Publication date: 01/01/2013
Field of study

Entregable del proyecto Collaborative Annotation of multi-MOdal, MultI-Lingual and multi-mEdia documents. This document describes the different corpora that will be used during the Camomile projectPeer ReviewedPreprin

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

KFuji RGB-DS database: Fuji apple multi-modal images for fruit detection with color, depth and range-corrected IR data

Author: Gené Mola Jordi
Gregorio López Eduard
Morros Rubió Josep Ramon
Rosell Polo Joan Ramon
Ruiz Hidalgo Javier
Vilaplana Besler Verónica
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

This article contains data related to the research article entitle 'Multi-modal Deep Learning for Fruit Detection Using RGB-D Cameras and their Radiometric Capabilities' [1]. The development of reliable fruit detection and localization systems is essential for future sustainable agronomic management of high-value crops. RGB-D sensors have shown potential for fruit detection and localization since they provide 3D information with color data. However, the lack of substantial datasets is a barrier for exploiting the use of these sensors. This article presents the KFuji RGBDS database which is composed by 967 multi-modal images of Fuji apples on trees captured using Microsoft Kinect v2 (Microsoft, Redmond, WA, USA). Each image contains information from 3 different modalities: color (RGB), depth (D) and range corrected IR intensity (S). Ground truth fruit locations were manually annotated, labeling a total of 12,839 apples in all the dataset. The current dataset is publicly available at http://www.grap.udl.cat/publicacions/datasets.html.This work was partly funded by the Secretaria d’Universitats i Recerca del Departament d’Empresa i Coneixement de la Generalitat de Catalunya, the Spanish Ministry of Economy and Competitiveness and the European Regional Development Fund (ERDF) under Grants 2017 SGR 646, AGL2013-48297-C2-2-R and MALEGRA, TEC2016-75976-R. The Spanish Ministry of Education is thanked for Mr. J. Gené’s pre-doctoral fellowships (FPU15/03355). We would also like to thank Nufri and Vicens Maquinària Agrícola S.A. for their support during data acquisition

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Repositori Obert UdL

Segmentation of video sequences and rate control

Author: Marcotegui Iturmendi Beatriz
Marqués Acosta Fernando
Morros Rubió Josep Ramon
Pardàs Feliu Montse
Salembier Clairon Philippe Jean
Publication venue
Publication date: 01/01/1997
Field of study

This paper deals with the relation between segmentation for coding and rate control. The efficiency of a segmentation-based coding scheme heavily relies on this step that defines how many and which regions have to be segmented. In this paper, we show that this problem can be formulated as a rate/distortion problem. The proposed solution not only controls the segmentation, but also defines the coding strategy to be used in each region. Together with the general approach, several simplified versions of the segmentation control are proposed and discussed.Peer ReviewedPostprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

UPC system for the 2016 MediaEval multimodal person discovery in broadcast TV task

Author: Bouritsas Giorgos
Cortillas Carla
Hernando Pericás Francisco Javier
India Massana Miquel Àngel
Martí Juan Gerard
Morros Rubió Josep Ramon
Sayrol Clols Elisa
Publication venue: CEUR-WS.org
Publication date: 01/01/2016
Field of study

The UPC system works by extracting monomodal signal segments (face tracks, speech segments) that overlap with the person names overlaid in the video signal. These segments are assigned directly with the name of the person and used as a reference to compare against the non-overlapping (unassigned) signal segments. This process is performed independently both on the speech and video signals. A simple fusion scheme is used to combine both monomodal annotations into a single one.Postprint (published version

UPCommons. Portal del coneixement obert de la UPC